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Against  Conditional  Probability 


1.  Background. 

If  there  is  any  distinction  in  the  realm  of  statistics  or 
inductive  logic  or  the  manipulation  of  uncertainty  that  deserves 
to  be  called  "classical",  it  is  the  distinction  between  direct  and 
inverse  inference 

Direct  inference  includes  among  its  premises  some 
statement  of  statistical  distribution,  or  relative  frequency,  or 
chance,  and  concludes  with  a  statement  of  probability  concerning 
a  sample  or  a  single  case.  From  50%  of  coin-tosses  yield  heads, 
in  the  absence  of  countervailing  arguments,  we  conclude  that  the 
probability  is  0.5  that  the  next  toss  will  yield  heads. 

Inverse  inference  takes  as  its  premises  a  statement  of 
sample  statistics  concerning  a  samiple  from  a  population,  together 
with  somie  other  premises,  and  concludes  with  a  statement  of 
statistical  distribution,  or  relative  frequency,  or  chance, 
applicable  to  the  population  as  a  whole. 

Both  direct  and  inverse  inference  are  characterized  by 
nonmonotonicity.  Adding  to  the  premises  may  undermine  a 
conclusion  in  either  case.  This  was  recognized  explicitly  by  R.  A 
Fisher  [1936,  p  254]:  "There  is  one  peculiarity  of  uncertain 
inference  which  often  presents  a  difficulty  to  mathematicians 
trained  only  in  the  technique  of  rigorous  deductive  argument. 


o 


namely  that  our  conclusions  are  arbitrary,  and  therelore  invalid, 
unless  all  the  data,  exhaustively,  are  taken  into  account  In 
rigorous  deductive  reasoning  we  may  make  any  selection  from 
the  data,  and  any  certain  conclusions  which  may  be  deduced 
from  this  selection  will  be  valid,  whatever  additional  data  we 
have  at  our  disposal  "  Even  so,  direct  inference  has  been 
regarded  as  relatively  unproblematic 

Given  as  a  premise  that  the  chance  of  heads  on  the  toss 
of  a  coin  is  a  half,  we  confidently  say  that  the  probability  of 
heads  on  the  first  toss  is  a  half  Given  a  further  premise  to  the 
effect  that  three  of  the  first  four  tosses  yielded  heads,  we 
recompute  the  probability  of  heads  on  the  first  toss  to  be  3/4 


2.  Eaves'  Theorem 

Inverse  inference  has  often  been  associated  with  Bayes 
theorem!.  For  example,  one  way  of  getting  at  the  paramieter  n 
characterizing  the  proportion  of  black  balls  in  an  urn  is  to  draw 
a  samiple,  and  to  apply  Bayes’  theorem  Bayes'  theorem, 
however,  requires  as  input  a  prior  distribution  for  the  paramieter 


p  ,  which  may  be  difficult  to  justify  in  terms  of  frequencies  or 
chances.  Thus  Jerzy  Neyman,  another  founding  father  of 
modern  statistics,  writes  [Neyman,  1957  p.  7]  "  persons  who 
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would  like  to  deal  only  with  classical  probabilities,  having  their 
counterparts  in  the  really  observable  frequencies,  are  forced  to  ion/ _ 
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look  for  a  solution  of  the  problem  of  estim.ation  other  than  by 
means  of  the  theorem  of  Bayes  "  This  is  not  to  say  that  Bayes' 
theorem  is  never  applicable.  As  R  A  Fisher  saw  clearly,  there 
are  many  situations  in  which  Bayes'  theorem  can  easily  be 
construed  in  terms  of  direct  inference  In  (.Fisher,  [1930])  he 
notes  that  drawing  from  a  super- population  in  which  the 
param.eter  of  interest  (say  p  )  has  a  known  distribution,  and 
then  getting  a  posterior  distribution  for  .  is  a  perfectly 

direct  argum.ent,  For  inverse  inference  proper  --  that  is, 
inference  whose  uncertainty  is  not  based  on  known  frequencies, 
but  on  subjective  probabilities  —  Fisher  has  nothing  but 
contempt  [1930]  "In  fact,  the  argument  runs  somewhat  as 
follows:  a  number  of  useful  but  uncertain  judgments  can  be 
expressed  with  exactitude  in  termis  of  probability,  our  judgm.ents 
respecting  causes  or  hypotheses  are  uncertain,  therefore  our 
rational  attitude  towards  them,  is  expressible  in  termis  of 
probability  Neyman's  attitude  is  even  less  tolerant 

Fisher  and  Neym.an  were  I'eacting  against  the  use  of  the 
so-called  axiom  of  Bayes  that  stipulated  the  use  of  uniform, 
priors.  Their  goal,  which  has  inform.ed  most  of  miodern 
statistical  practise,  was  to  do  without  priors  Since  their  day, 
however,  inverse  inference  proper  has  becom.e  (almost) 
respectable  again.  This  is  particularly  so  in  philosophy,  in  which 
inductive  inference  is  often  supposed  to  take  place  only  by  means 
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of  Bayes'  theorem,  and  in  AI,  in  which  the  updating  or 
modification  of  uncertainty  is  assumed  to  take  place  only  by 
means  of  conditionalization  I  claim  that  there  is  a  serious 
conflict  between  direct  inference  and  inverse  inference  proper  — 
that  IS,  the  use  of  conditionalization.  Since  the  original  point  of 
inverse  inference  was  to  serve  the  interest  of  direct  inference  by 
providing  statistical  premises,  we  should  hold  onto  direct  inference 
and  abandon  conditionalization  except  in  those  cases  (which  are 
many)  in  w’hich  it  can  be  reduced  to  direct  inference, 

3  Direct  Inference 

Direct  inference  has  always  seemed  sc  obvious,  that 
amost  nobody  has  made  a  serious  attempt  to  reduce  it  to  rules 
(Reichenbach  [1949]  is  an  exception,  my  contrasting  view 
appeared  in  [1961]  )  Two  simiple  rules  will  suffice  for  cur 
purposes  here ,  they  are  the  difference  rule  and  the  strength 
rule  Essentially,  they  are  rules  for  choosing  a  reference  class 

We  will  generalize  the  notion  of  probability  very  slightly 
to  accomodate  the  obvious  fact  that  we  don't  know  most  general 
frequencies  or  distributions  exactly,  we  will  therefore  represent 
both  probabilities  and  our  knowledge  of  frequencies  by  intervals. 
We  will  say  that  two  intervals  differ  if  they  are  not  identical  and 
neither  is  included  in  the  other  If  one  is  included  in  the  other 
we  will  say  that  it  is  stronger  than  the  other . 
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The  difference  rule  says  that  if  you  have  two  possible 
reference  classes,  and  they  are  characterized  by  different 
frequeiicy  intervals,  then,  if  one  reference  class  is  included  in  the 

f. 

other,  it  may  be  the  right  reference  class,  but  if  neither  is 
included  in  the  other  we  may  have  to  look  elsewhere  for  a 
reference  class  Thus  if  you  know  of  a  card  that  it  is  black,  the 
probability  that  it  is  a  spade  is  determined  by  the  frequency  of 
spades  among  black  cards,  not  am.ong  cards  in  general 

The  strength  rule  says  that  if  you  have  two  possible 
reference  classes,  and  neither  is  ruled  out  as  a  reference  class  by 
differing  from  some  other  reference  class,  then  ^,he  one  about 
which  our  statistical  knowledge  is  stronger  is  the  better  reference 
class  As  an  extreme  exam^ple,  :  know  that  the  frequency  of 
heads  in  the  set  of  tosses  consisting  of  the  singleton  of  the  nei-:t 
toss  IS  in  the  closed  interval  [0,1],  and  that's  ail  !  knew  about 
It  But  I  know  that  among  coin-tosses  in  general,  heads  occur 
with  a  frequency  very  close  to  a  half  It  is  the  latter  that 
constitutes  a  better  reference  class 

This  characterization  of  direct  inference  leaves  out  a 
numxber  of  important  aspects,  but  they  are  not  essential  for  our 
purposes  here.  Details  can  be  found  in  [1983] 


Here  is  an  example  in  which  the  conflict  between  direct 
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inference  and  conditionalizaticn  comes  out  clearly 

Suppose  that  we  are  running  a  factory,  and  wc  use  a 
certain  type  of  instrument  to  test  our  product  The 
manufacturer  of  the  instrum^ents  certifies  on  the  basis  of 
extensive  testing,  that  his  instrum.ents  are  subject  to  error 
greater  than  t?  exactly  20%  of  the  time.  Of  course  we 
understand  "exactly  20%"  to  mean  very  c/ose  to  20^  We  have 
no  reason  to  doubt  this  report. 

Pick  an  item  off  the  assembly  line  Test  it  with  the 
instrument  The  probabilty  that  the  true  value  is  within  »?  of  the 
reading  is  clearly  20  This  is  just  direct  inference,  making  use 
of  the  fact  that  we  know  the  frequency  of  errors  of  magnitude  e. 

Now  let  us  suppose  that  we  seem  to  notice  that  some 
instruments  are  more  accurate  than  others  Of  course  that  is 
bound  to  be  the  case,  and  dees  not  impugn  the  .manufacturer's 
claim  that  the  error  rate  is  20%  We  are  inspired  to  look  into 
the  matter  m.ore  deeply:  we  note  that  the  instruments  are 
inspected  by  three  different  inspectors.  A,  B,  and  C  We  form 
the  hypothesis  that  the  accuracy  of  the  instru.m.ent  is  related  to 
the  identity  of  the  inspector  who  passed  it  V/e  take  a  sample  of 
400  each  of  readings  made  on  each  kind  of  instrument,  and 
compare  the  readings  made  by  our  super-accurate- tester 

The  number  of  readings  in  error  by  more  than  e"  ,  and 
the  ratio  of  such  readings  to  the  total  number,  is  presented  in 


the  following  table 


type 

trials 

errors 

rate 

A 

400 

108 

^  ■'  u 

5 

400 

c --i 

130 

r 

400 

70 

i  "7C 

We  may  also  compute  ratios  in  broader  classes  from*  the  sarnie 
data 


AU  B 

800 

160 

200 

Au  C 

800 

178 

2225 

BU  C 

800 

1 

AUBUC  1200 

230 

What  do  we  do  with  this  data”^  Well,  we  can  use 
direct  inference  to  draw  conclusions  about  the  general  classes  of 
mieasurements  A,  5,  and  C,  and  their  comibinations  Taking  95 
as  an  acceptance  level,  w'e  note  that  that  ccrrespcnds  t;  ±  2 
standard  deviations  on  the  normal  distribution  ’n  the  present 
oase  we  can  disregard  the  difference  between  the  bincm.ial  and 
the  normial  distribution.  The  miean  of  the  difference  between  the 
sample  miean  and  the  population  miean  is  0  We  thus  get  the 
following  confidence  intervals  for  the  error  rates  in  the  classes 
tested : 
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typg 

number 

standard 

interva 

I 

L*. 

deviation 

A 

AOO 

040 

[  230, 

310] 

B 

400 

040 

[  090, 

170] 

C 

400 

.040 

1  135, 

5l 

AUB 

800 

.028 

[172, 

228] 

AUC 

800 

.028 

[  194, 

251] 

BvJC 

800 

.028 

[  124, 

181] 

AOBUC 

1200 

023 

[169, 

215] 

Assume  that  the  sampling  procedures  leave  nothing  to  be  desired. 
Note  that  our  results  do  not  impugn  the  manuiacturer's  claim 
that  the  relative  frequency  of  errors  is  .20  At  the  same  time, 
when  we  now  pick  an  item  from  the  assemibly  line  and  measure 
it,  if  we  notice  that  we  have  an  instrument  of  type  A,  we  will 
suppose  that  the  probability  is  [  23,  .31]  that  the  reading  is  in 
error  by  more  than  s’  Similarly,  if  we  use  an  instrument  of 
type  B,  we  find  the  probability  of  this  amiount  ot  error  to  be 
only  [.09,  .  17] . 


On  the  other  hand,  if  we  are  locking  over  old  records, 
and  the  inspector  of  the  instrument  with  which  a  measurement 
was  made  was  not  recorded,  it  seems  right  to  use  the  old 
probability  of  error  of  .20.  Someone  might  want  to  argue  that 
since  we  have  this  new  information  about  AuBoC  --  namely. 


that  the  frequency  of  error  is  between  169  and  215  --  we 
should  use  that  interval  But  why'^  That  doesn’t  disagree 
the  manufacturer’s  error  frequency,  and  no  doubt  his  frequency 
IS  based  on  vastly  more  information  than  is  ours  It  seems  to  be 
just  a  waste  of  good  mformiaticn  to  use  our  rough  estimate  in 
place  of  his  refined  one.  At  any  rate,  this  is  the  intuition  on 
which  the  strength  rule  is  based 

If  these  are  our  intuitions,  we  must  reject 
conditionalization  in  this  case.  Take  the  probabilities  we  get  from 
direct  inference  to  constrain  our  degrees  of  belief,  let  cur  degrees 
of  belief  be  given  by  a  belief  function  BF  satisfying  the  axiomis  cf 
probability.  Then  by  conditionalization  and  the  principle  of  total 
probability  we  have 

BF(E!AuBaC)  =  BF(E|AuB)BF(AuB)  -  BFfE|C}BF(C) 
Since  the  probability  of  E  given  AjB  is  200,  bv  the  strength 
rule,  and  the  same  is  true  of  the  probability  of  E  given  AOBu'2, 
and  since  the  probability  of  EiC  is  positive,  this  identity  can  only 
be  satisfied  if  BF(C)  is  0  Again,  by  conditionalization  and  the 
principle  of  total  probability  we  have 

BF(ElAuBuC)  =  BF(E!AoC)BF(AUC)  +  BF(EjEjBF(B) 
so  that,  by  the  same  argum.ent,  we  have  BF(B}  =  0 

rromi  this  it  follows  that  BF(A)  =  1,  since  BF(5}  = 

BF(C)  =  0.  So  we  have  BF(E|AUBUC)  =  BF(EiA)  e  [  230,  310], 
contrary  to  our  assumption  that  BF  was  to  be  constrained  by  cur 


probability  intervals 

We  must  give  up  something  The  principle  total 
probability  is  hard  to  give  up'  every  frequency  function  that 
appues  to  the  world  satisfies  the  principle  of  total  probability 
But  conditionalization,  applied  to  a  belief  function,  describes  new 
our  beliefs  are  supposed  to  change  in  response  to  incomung 
evidence  And  there  is  nothing  sacrosanct  about  that  Of  course 
conditionalization  will  apply  sometimies  If  we  knew-'  what 
proportion  of  the  measurements  were  given  by  each  of  the  three 
types  of  instrumients  (so  what  we  had  probabilities  on  w-hich  to 
base  BF;,A.),  BF(B),  and  BF(C))  then  of  course  w’e  would  not 
apply  the  strength  rule,  but  rather  an  appropriate  weighted 
average  in  obtaining  BF  (E|Avj  B\J  C)  And  this  could  (and  should) 
be  based  on  a  direct  inference 


.ne  *.vav  lea.ine  witn  tne 


section  IS  to  insist  that  once  we  have  tested  a  nurr.ber  :f  the 
m.easurem.ents  .made  in  our  factory,  we  should  use  those  statistics 
for  our  probabilities  And  thus,  for  exam.ple,  to  use  169,  215] 
rather  than  200  for  BF(E!AuB<JC)  This  won't  do  for  tw^o 
reasons  Suppose  we  had  tested  the  three  types  of  instruments 
and  not  found  any  evidence  that  the  expected  frequency  of  errors 
differed  Surely  in  that  case  we  would  feel  free  to  continue  using 


the  manufacturer's  error  rate  of  200  Furthermore,  we  always 
have  very  specific  statistical  inform.ation  concerning  the  error 
rate  in  future  m.easurements  Thus  I  know  that  the  errtr  rate 


among  future  measurem.ents  miade  by  mie  using  instru.ments  of 
type  A  IS  in  the  interval  [O.l]  And  this  is  probably  a//  I  know 
about  that  class  of  measurements  Surely  I  should  not  be 
required  to  take  the  probability  of  error  to  be  [0,l] 

The  response  of  the  subjectivist  to  this  sort  of  examiple 
IS  two-fold  First  the  subjectivist  will  assert  that  probabilities 
cart  belief  functions,  and  that  therefore  intervals  won't  do  Given 
any  mieasurement  performed  with  an  instrument  in  AvjBuC, 
EF(A),  BF(B),  and  BF(C)  —  the  degree  of  belief  that  it  was 
performed  with  an  instrument  of  type  A,  B.  or  C,  respectively 
--  are  all  real  valued  and  add  up  to  1.  Similarly,  the 
conditional  probability  BF(E!A)  is  real  valued:  by  the  preceding 


kind  of  argumient  BFfEiA/ 


A 


no  / 


D~  Oirt,/  ''Cr  vH,/  , 


where  A,  is  the  condition  that  an  instrumieni  of  type  A  is  usee 


and  the  error  rate  of  instruments  of  type  A  is  in  the  I'th 
subinterval  of  [  230,  310J  Naturally,  we  can  make  these 
subintervals  as  small  as  we  want  So  the  subjectivist  thinks  I 
can  do  things  that  I  don't  think  1  can  do,  like  making  ail 
probabilities  precise 

On  the  other  hand,  the  subjectivist  can  offer  an 
independent  argum.ent  for  conditionahzation  Since  I  have 


accepted  total  probability,  if  I  can  be  compelled  to  accept 
conditionalization  as  well,  I  shall  find  myself  having  to  reject 
direct  inference  —  or  at  least  the  strength  rule  The  argument 
goes  like  the  standard  dutch  book  argumients  for  the  probability 
axiom.s  Roughly  it  is  this  if  you  allow  conditional  bets,  but  do 
not  adjust  your  beliefs  in  accordance  with  the  principle  of 
conditionalization,  then  your  unfriendly  bettor  will  bet  on  X  at 
your  odds,  on  X  &  Y  at  your  odds,  and  will  miake  a  conditional 
bet  on  X,  conditional  on  the  occurrence  of  Y,  at  those  odds  you 
will  offer  once  you  have  observed  Y  If  you  do  not  obey  the 
principle  of  conditionalization,  these  need  net  be  the  sarnie  odds, 
and  the  unfriendly  better  will  be  able  to  win  for  sure  Thus,  it 
15  claimed,  the  principle  of  conditionalization  has  the  same  degree 
of  soundness  as  any  other  principle  of  probability 

To  this  I  respond  that  dutch  book  argumients  are  not 
very  persuasive  an/''.vay.  It  is  a  miatter  cf  deductive 
self-preservation  --  and  has  nothing  tc  do  with  degrees  of  belief 
—  net  to  miake  a  set  of  bets  on  which  you  are  bound  to  lose 
mioney  But  it  is  also  not  always  appropriate  to  lock  at 
conditional  bets.  In  the  examiple  of  the  instrumient,  the  numbers 
to  which  I  am  led  seem  perfectly  reasonable,  even  though  they 
are  not  consistent  with  any  probabilistic  belief  function  w'hose 
range  includes  prior  beliefs  about  which  instrument  is  used 
Finally,  as  one  can  see  from  this  kind  of  example,  the 


subjectivist  requires  that  the  value  of  the  belief  function  be 
determined  for  all  possible  future  contingencies  —  and  then  never 
changed  (Conditionalization  does  not  involve  a  change  of  the 
oasic  belief  function  P'(E;  =  PfE  A)/P(A;,  for  example,  sc 
that  while  one's  initial  absolute  belief  function  is  updated,  cne  is 
never  allowed  to  change  one's  mind  ) 

In  termis  of  the  book-miaking  .metaphor,  the  bcokie 
must  post  odds  on  all  possible  contingencies,  and  then  take  those 
odds  to  determin€all  his  conditional  bets,  whatever  happens, 
whatever  new  evidence  there  is,  the  bookie  need  mierely  look  up 
the  corresponding  conditional  odds  in  his  initial  table  He  cannot 
change  his  odds  I  suggest  that  this  is  overly  rigid  One  should, 
perhaps  rarely,  be  willing  to  change  ones  odds  in  a  fundamental 
way.  Learning  only  by  conditionalization  implies  an  e.xcessively 
narrow  view  of  learning  Where  there  is  conflict  with  direct 
inference,  direct  inference  should  prevail,  and  co.nditic.nalization 
should  go  hang 


note 


1  This  example  is  due  essentially  to  Levi  [1977j  and 

[l980],  where  it  is  alleged  to  show  the  incoherence  of  the 
strength  rule  I  have  profited  also  from  extensive  discussions 
with  Levi  on  these  matters. 


bibliography 

Fisher,  Ronald  A.  [l930]  "Uncertain  Inference,"  Prjct^eding^s  u/ 
the  American  Academy  of  Arts  and  Sciences  71,  245-254 

Fisher,  R.onald  A.  [l930]  "Inverse  Probability,"  Proceedings  of 
the  Cambgidge  PhiloscpphicaJ  Society  ,  528-535 

Kyburg,  Henry  E,  [l96l]  Probability  and  the  Logic  of  Rational 
Belief,  Wesleyan  University  Press,  Middletown. 


Kyburg,  Henry  E  [1983]  "The  Reference  Class,  ' 
Science  ^0,  374-397 


Philosochv  of 


Levi,  Isaac  [1977]  "Direct  Inference,"  Journal  of  Philcsophy  7  A, 
pp  5-29 


Levi,  Isaac  [l980]  The  Enterprise  of  Knowledge,  MIT  Press, 
Cambridge . 


Institute  *1^ ,  pp  7-22 


